Inferring missing genotypes in large SNP panels using fast nearest-neighbor searches over sliding windows

نویسندگان

  • Adam Roberts
  • Leonard McMillan
  • Wei Wang
  • Joel Parker
  • Ivan Rusyn
  • David Threadgill
چکیده

MOTIVATION Typical high-throughput genotyping techniques produce numerous missing calls that confound subsequent analyses, such as disease association studies. Common remedies for this problem include removing affected markers and/or samples or, otherwise, imputing the missing data. On small marker sets imputation is frequently based on a vote of the K-nearest-neighbor (KNN) haplotypes, but this technique is neither practical nor justifiable for large datasets. RESULTS We describe a data structure that supports efficient KNN queries over arbitrarily sized, sliding haplotype windows, and evaluate its use for genotype imputation. The performance of our method enables exhaustive exploration over all window sizes and known sites in large (150K, 8.3M) SNP panels. We also compare the accuracy and performance of our methods with competing imputation approaches. AVAILABILITY A free open source software package, NPUTE, is available at http://compgen.unc.edu/software, for non-commercial uses.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of missing genotype imputation through bi - directional parsing of large SNP panels Christine Sinoquet

Such difficult analyses as disease association studies, which aim at mappping genetic variants underlying complex human diseases, rely on high-throughput genotyping techniques. However, a shortcoming of these techniques is the generation of missing calls. Computational inference of missing data represents a challenging alternative to genotyping again the missing regions. In this paper, we prese...

متن کامل

LinkImpute: Fast and Accurate Genotype Imputation for Nonmodel Organisms

Obtaining genome-wide genotype data from a set of individuals is the first step in many genomic studies, including genome-wide association and genomic selection. All genotyping methods suffer from some level of missing data, and genotype imputation can be used to fill in the missing data and improve the power of downstream analyses. Model organisms like human and cattle benefit from high-qualit...

متن کامل

Iterative Two-Pass Algorithm for Missing Data Imputation in SNP Arrays

Though nowadays high-throughput genotyping techniques' quality improves, missing data still remains fairly common. Studies have shown that even a low percentage of missing SNPs is detrimental to the reliability of down-stream analyses such as SNP-disease association tests. This paper investigates the potentiality for improving the accuracy of an SNP inference method based on the algorithm forme...

متن کامل

TERZIĆ: SHAPE DETECTION WITH NEAREST NEIGHBOUR CONTOUR FRAGMENTS 1 Shape Detection with Nearest Neighbour Contour Fragments

We present a novel method for shape detection in natural scenes based on incomplete contour fragments and nearest neighbour search. In contrast to popular methods which employ sliding windows, chamfer matching and SVMs, we characterise each contour fragment by a local descriptor and perform a fast nearest-neighbour search to find similar fragments in the training set. Based on this idea, we sho...

متن کامل

Imputing missing genotypes with weighted k nearest neighbors.

Missing values are a common problem in genetic association studies concerned with single-nucleotide polymorphisms (SNPs). Since many statistical methods cannot handle missing values, such values need to be removed prior to the actual analysis. Considering only complete observations, however, often leads to an immense loss of information. Therefore, procedures are required that can be used to im...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 23 13  شماره 

صفحات  -

تاریخ انتشار 2007